Approximating the Longest Increasing Sequence and Distance from Sortedness in a Data Stream

نویسندگان

  • Parikshit Gopalan
  • Robert Krauthgamer
  • Ravi Kumar
چکیده

We revisit the well-studied problem of estimating the sortedness of a data stream. We study the complementary problems of estimating the edit distance from sortedness (Ulam distance) and estimating the length of the longest increasing sequence (LIS). We present the first sub-linear space algorithms for these problems in the data stream model. • We give a O(log n) space, one-pass randomized algorithm that gives a (4 + ) approximation to the Ulam distance. • We O( √ n) space deterministic (1 + ) one-pass approximation algorithms for estimating the the length of the LIS and the Ulam distance. • We show a tight lower bound of Ω(n) on the space required by any randomized algorithm to compute these quantities exactly. This improves an Ω( √ n) lower bound due to Vee et al. and shows that approximation is essential to get space-efficient algorithms. • We conjecture a space lower bound of Ω( √ n) on any deterministic algorithm approximating the LIS. We are able to show such a bound for a restricted class of algorithms, which nevertheless captures all the algorithms described above. Our algorithms and lower bounds use techniques from communication complexity and property testing. ∗Work done in part while the author was at IBM Almaden

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lecture 15 : Sortedness , Connectivity , MST Weight , Components

which consists of an increasing sequence of m decreasing sequences, each of length n/m. The longest increasing subsequence has length m but but the only way for the sample to imply that the input is not sorted is if two chosen elements land in the same decreasing sequence. The probability that this happens is at most ( s 2 ) /m ≤ s/(2m) which is o(1) if s is o( √ m). In particular, for ε = 1/2 ...

متن کامل

On Differentially Private Longest Increasing Subsequence Computation in Data Stream

Many important applications require a continuous computation of statistics over data streams. Activities monitoring, surveillance and fraud detections are some settings where it is crucial for the monitoring applications to protect user’s sensitive information in addition to efficiently compute the required statistics. In the last two decades, a broad range of techniques for time-series and str...

متن کامل

Measuring the Similarity of Trajectories Using Fuzzy Theory

In recent years, with the advancement of positioning systems, access to a large amount of movement data is provided. Among the methods of discovering knowledge from this type of data is to measure the similarity of trajectories resulting from the movement of objects. Similarity measurement has also been used in other data mining methods such as classification and clustering and is currently, an...

متن کامل

Application of Data-Mining Algorithms in the Sensitivity Analysis and Zoning of Areas Prone to Gully Erosion in the Indicator Watersheds of Khorasan Razavi Province

Extended abstract 1- Introduction Gully erosion is one of the most important sources of sediment in the watersheds and a common phenomenon in semi-arid climate that affects vast areas with different morphological, soil and climatic conditions. This type of erosion is very dangerous due to the transfer of fertile soil horizons, and the reduction of water holding capacity also is a factor for s...

متن کامل

Space efficient streaming algorithms for the distance to monotonicity and asymmetric edit distance

Approximating the length of the longest increasing sequence (LIS) of an array is a well-studied problem. We study this problem in the data stream model, where the algorithm is allowed to make a single left-to-right pass through the array and the key resource to be minimized is the amount of additional memory used. We present an algorithm which, for any δ > 0, given streaming access to an array ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006